Search CORE

558 research outputs found

Local Ranking Problem on the BrowseGraph

Author: Andersen R.
Bharat K.
Boldi P.
Chiarandini L.
Cho J.
Davis J. V.
Gyöngyi Z.
Lehmann J.
Page L.
Smola A. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2015
Field of study

The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8

arXiv.org e-Print Archive

CiteSeerX

Crossref

A framework for space-efficient string kernels

Author: A Apostolico
A Apostolico
AJ Smola
AM İleri
B Chor
D Belazzougui
G Reinert
GE Sims
J Herold
J Qi
J Shawe-Taylor
M Crochemore
R Chikhi
S Chairungsee
Publication venue
Publication date: 23/02/2015
Field of study

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the

k

-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in

O(nd)

time and in

o(n)

bits of space in addition to the input, using just a

\mathtt{rangeDistinct}

data structure on the Burrows-Wheeler transform of the input strings, which takes

O(d)

time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of

k

, like the

k

-mer profile and the

k

-th order empirical entropy, and for calibrating the value of

k

using the data

arXiv.org e-Print Archive

Crossref

On landmark selection and sampling in high-dimensional data analysis

Author: Blackburn J.
Deshpande A.
Drineas P.
Elgammal A.
Fowlkes C.
Lee K.-C.
Lee K.-C.
Liu R.
Ouimet M.
Platt J. C.
Smola A. J.
Talwalkar A.
Williams C. K. I.
Publication venue: 'The Royal Society'
Publication date: 24/06/2009
Field of study

In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nystrom extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world examples drawn from the field of computer vision, whereby low-dimensional manifold structure is shown to emerge from high-dimensional video data streams.Comment: 18 pages, 6 figures, submitted for publicatio

arXiv.org e-Print Archive

Crossref

PubMed Central

UCL Discovery

Neuropathology in COVID-19 autopsies is defined by microglial activation and lesions of the white matter with emphasis in cerebellar and brain stem areas

Author: Julian A. Stein
Manuel Kaes
Sigrun Smola
Sigrun Smola
Walter J. Schulz-Schaeffer
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2023
Field of study

IntroductionThis study aimed to investigate microglial and macrophage activation in 17 patients who died in the context of a COVID-19 infection in 2020 and 2021.MethodsThrough immunohistochemical analysis, the lysosomal marker CD68 was used to detect diffuse parenchymal microglial activity, pronounced perivascular macrophage activation and macrophage clusters. COVID-19 patients were compared to control patients and grouped regarding clinical aspects. Detection of viral proteins was attempted in different regions through multiple commercially available antibodies.ResultsMicroglial and macrophage activation was most pronounced in the white matter with emphasis in brain stem and cerebellar areas. Analysis of lesion patterns yielded no correlation between disease severity and neuropathological changes. Occurrence of macrophage clusters could not be associated with a severe course of disease or preconditions but represent a more advanced stage of microglial and macrophage activation. Severe neuropathological changes in COVID-19 were comparable to severe Influenza. Hypoxic damage was not a confounder to the described neuropathology. The macrophage/microglia reaction was less pronounced in post COVID-19 patients, but detectable i.e. in the brain stem. Commercially available antibodies for detection of SARS-CoV-2 virus material in immunohistochemistry yielded no specific signal over controls.ConclusionThe presented microglial and macrophage activation might be an explanation for the long COVID syndrome

Directory of Open Access Journals

Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Distributional Operators

Author: A. Berlinet
A. Bouhamidi
A. Bouhamidi
A.J. Smola
B. Schölkopf
D.G. Schweikert
E.M. Stein
G. Wahba
G.E. Fasshauer
Gregory E. Fasshauer
H. Wendland
J. Duchon
J. Kybic
M.D. Buhmann
M.L. Stein
Qi Ye
R. Schaback
R.A. Adams
W.A. Light
W.R. Madych
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/03/2013
Field of study

In this paper we introduce a generalized Sobolev space by defining a semi-inner product formulated in terms of a vector distributional operator

\mathbf{P}

consisting of finitely or countably many distributional operators

P_n

, which are defined on the dual space of the Schwartz space. The types of operators we consider include not only differential operators, but also more general distributional operators such as pseudo-differential operators. We deduce that a certain appropriate full-space Green function

G

with respect to

L:=\mathbf{P}^{\ast T}\mathbf{P}

now becomes a conditionally positive definite function. In order to support this claim we ensure that the distributional adjoint operator

\mathbf{P}^{\ast}

\mathbf{P}

is well-defined in the distributional sense. Under sufficient conditions, the native space (reproducing-kernel Hilbert space) associated with the Green function

G

can be isometrically embedded into or even be isometrically equivalent to a generalized Sobolev space. As an application, we take linear combinations of translates of the Green function with possibly added polynomial terms and construct a multivariate minimum-norm interpolant

s_{f,X}

to data values sampled from an unknown generalized Sobolev function

f

at data sites located in some set

X \subset \mathbb{R}^d

. We provide several examples, such as Mat\'ern kernels or Gaussian kernels, that illustrate how many reproducing-kernel Hilbert spaces of well-known reproducing kernels are isometrically equivalent to a generalized Sobolev space. These examples further illustrate how we can rescale the Sobolev spaces by the vector distributional operator

\mathbf{P}

. Introducing the notion of scale as part of the definition of a generalized Sobolev space may help us to choose the "best" kernel function for kernel-based approximation methods.Comment: Update version of the publish at Num. Math. closed to Qi Ye's Ph.D. thesis (\url{http://mypages.iit.edu/~qye3/PhdThesis-2012-AMS-QiYe-IIT.pdf}

arXiv.org e-Print Archive

Crossref

Statistical Mechanical Development of a Sparse Bayesian Classifier

Author: Alon U.
Fisher R. A.
Gallager R. G.
Kabashima Y.
Kabashima Y.
MacKay D. J. C.
Malzahn D.
Mezard M.
Mezard M.
Neal R. M.
Nishimori H.
Nishimori H.
Opper M.
Pearl J.
Smola S. J.
Tipping M. E.
Vapnik V. N.
Watkin T. H.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 21/10/2005
Field of study

The demand for extracting rules from high dimensional real world data is increasing in various fields. However, the possible redundancy of such data sometimes makes it difficult to obtain a good generalization ability for novel samples. To resolve this problem, we provide a scheme that reduces the effective dimensions of data by pruning redundant components for bicategorical classification based on the Bayesian framework. First, the potential of the proposed method is confirmed in ideal situations using the replica method. Unfortunately, performing the scheme exactly is computationally difficult. So, we next develop a tractable approximation algorithm, which turns out to offer nearly optimal performance in ideal cases when the system size is large. Finally, the efficacy of the developed classifier is experimentally examined for a real world problem of colon cancer classification, which shows that the developed method can be practically useful.Comment: 13 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Knot selection in sparse Gaussian processes with a variational objective function

Author: Bauer M.
Cao Y.
Naish‐Guzman A.
Quiñonero‐Candela J.
R Core Team
Rasmussen C. E.
Seeger M.
Smola A. J.
Snelson E.
Titsias M.
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2020
Field of study

Sparse, knot‐based Gaussian processes have enjoyed considerable success as scalable approximations of full Gaussian processes. Certain sparse models can be derived through specific variational approximations to the true posterior, and knots can be selected to minimize the Kullback‐Leibler divergence between the approximate and true posterior. While this has been a successful approach, simultaneous optimization of knots can be slow due to the number of parameters being optimized. Furthermore, there have been few proposed methods for selecting the number of knots, and no experimental results exist in the literature. We propose using a one‐at‐a‐time knot selection algorithm based on Bayesian optimization to select the number and locations of knots. We showcase the competitive performance of this method relative to optimization of knots simultaneously on three benchmark datasets, but at a fraction of the computational cost

Digital Repository @ Iowa State University (ISU)

arXiv.org e-Print Archive

Crossref

Larger Residuals, Less Work: Active Document Scheduling for Latent Dirichlet Allocation

Author: A. Frieze
A. Smola
D. Blei
D. Newman
J. Sun
M. Mahoney
P. Drineas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis

Author: Busan Steven
Rice Greggory M
Siegfried Nathan A
Smola Matthew J
Weeks Kevin M
Publication venue
Publication date: 01/01/2015
Field of study

SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures

PubMed Central

Carolina Digital Repository

Applications of Polynomial Chaos-Based Cokriging to Aerodynamic Design Optimization Benchmark Problems

Author: Alexandrov N. M.
Bouhlel M. A.
Chan W. M.
Chen S.
Cook P. H.
Du X.
Du X.
Economon T. D.
Efron B.
Forrester I. J. A.
Forrester J. I. A.
Gneiting T.
Krige D. G.
Leung T. M.
Lyu Z.
Matheron G.
Peherstorfer B.
Queipo N.
Rathinam M.
Ryu J.
Samareh J. A.
Schobi R.
Smola A. J.
Tesfahunegn Y. A.
Wiener N.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2020
Field of study

In this work, the polynomial chaos-based Cokriging (PC-Cokriging) is applied to a benchmark aerodynamic design optimization problem. The aim is to perform fast design optimization using this multifidelity metamodel. Multifidelity metamodels use information at multiple levels of fidelity to make accurate and fast predictions. Higher amount of lower fidelity data can provide important information on the trends to a limited amount of high-fidelity (HF) data. The PC-Cokriging metamodel is a multivariate version of the polynomial chaos-based Kriging (PC-Kriging) metamodel and its construction is similar to Cokriging. It combines the advantages of the interpolation-based Kriging metamodel and the regression-based polynomial chaos expansions (PCE). In the work the PC-Cokriging model is compared to other metamodels namely PCE, Kriging, PC-Kriging and Cokriging. These metamodel are first compared in terms of global accuracy, measured by root mean squared error (RMSE) and normalized RMSE (NRMSE) for different sample sets, each with an increasing number of HF samples. These metamodels are then used to find the optimum. Once the optimum design is found computational fluid dynamics (CFD) simulations are rerun and the results are compared to each other. In this study a drag reduction of 73.1 counts was achieved. The multifidelity metamodels required 19 HF samples along with 1,055 low-fidelity to converge to the optimum drag value of 129 counts, while the single fidelity models required 155 HF samples to do the same

Crossref

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine